11 research outputs found

    Quantifying alternative splicing from paired-end RNA-sequencing data

    Full text link
    RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence suboptimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a nonparametric, highly flexible manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses. We found a severalfold improvement in estimation mean square error compared popular approaches in simulations, and substantially higher consistency between replicates in experimental data. Our findings indicate the need for adjusting the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS687 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org). With correction

    A Bayesian time-varying autoregressive model for improved short‐term and long‐term prediction

    Get PDF
    Motivated by the application to German interest rates, we propose a time‐varying autoregressive model for short‐term and long‐term prediction of time series that exhibit a temporary nonstationary behavior but are assumed to mean revert in the long run. We use a Bayesian formulation to incorporate prior assumptions on the mean reverting process in the model and thereby regularize predictions in the far future. We use MCMC‐based inference by deriving relevant full conditional distributions and employ a Metropolis‐Hastings within Gibbs sampler approach to sample from the posterior (predictive) distribution. In combining data‐driven short‐term predictions with long‐term distribution assumptions our model is competitive to the existing methods in the short horizon while yielding reasonable predictions in the long run. We apply our model to interest rate data and contrast the forecasting performance to that of a 2‐Additive‐Factor Gaussian model as well as to the predictions of a dynamic Nelson‐Siegel model.Peer Reviewe

    Elastic analysis of irregularly or sparsely sampled curves

    Get PDF
    We provide statistical analysis methods for samples of curves in two or more dimensions, where the image, but not the parameterization of the curves, is of interest and suitable alignment/registration is thus necessary. Examples are handwritten letters, movement paths, or object outlines. We focus in particular on the computation of (smooth) means and distances, allowing, for example, classification or clustering. Existing parameterization invariant analysis methods based on the elastic distance of the curves modulo parameterization, using the square‐root‐velocity framework, have limitations in common realistic settings where curves are irregularly and potentially sparsely observed. We propose using spline curves to model smooth or polygonal (FrĂ©chet) means of open or closed curves with respect to the elastic distance and show identifiability of the spline model modulo parameterization. We further provide methods and algorithms to approximate the elastic distance for irregularly or sparsely observed curves, via interpreting them as polygons. We illustrate the usefulness of our methods on two datasets. The first application classifies irregularly sampled spirals drawn by Parkinson's patients and healthy controls, based on the elastic distance to a mean spiral curve computed using our approach. The second application clusters sparsely sampled GPS tracks based on the elastic distance and computes smooth cluster means to find new paths on the Tempelhof field in Berlin. All methods are implemented in the R‐package “elasdics” and evaluated in simulations.Peer Reviewe

    Boosting Functional Response Models for Location, Scale and Shape with an Application to Bacterial Competition

    Get PDF
    We extend Generalized Additive Models for Location, Scale, and Shape (GAMLSS) to regression with functional response. This allows us to simultaneously model point-wise mean curves, variances and other distributional parameters of the response in dependence of various scalar and functional covariate effects. In addition, the scope of distributions is extended beyond exponential families. The model is fitted via gradient boosting, which offers inherent model selection and is shown to be suitable for both complex model structures and highly auto-correlated response curves. This enables us to analyze bacterial growth in \textit{Escherichia coli} in a complex interaction scenario, fruitfully extending usual growth models.Comment: bootstrap confidence interval type uncertainty bounds added; minor changes in formulation

    Multivariate functional additive mixed models

    Get PDF
    Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.Peer Reviewe

    Quantifying alternative splicing from paired-end RNA-sequencing data

    Get PDF
    RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing is involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence sub-optimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a non-parametric, data-dependent manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream differential expression analysis. We found an over 5 fold improvement in estimation mean square error compared to a popular approach in simulations, and substantially higher correlations between replicates in experimental data. Our findings indicate the need for modifying the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper

    The effect of rapid relative humidity changes on fast filter-based aerosol-particle light-absorption measurements: Uncertainties and correction schemes

    Get PDF
    Measuring vertical profiles of the particle light-absorption coefficient by using absorption photometers may face the challenge of fast changes in relative humidity (RH). These absorption photometers determine the particle light-absorption coefficient due to a change in light attenuation through a particle-loaded filter. The filter material, however, takes up or releases water with changing relative humidity (RH in %), thus influencing the light attenuation. A sophisticated set of laboratory experiments was therefore conducted to investigate the effect of fast RH changes (dRH/dt) on the particle light-absorption coefficient (σabs in Mm-1) derived with two absorption photometers. The RH dependence was examined based on different filter types and filter loadings with respect to loading material and areal loading density. The Single Channel Tricolor Absorption Photometer (STAP) relies on quartz-fiber filter, and the microAethÂź MA200 is based on a polytetrafluoroethylene (PTFE) filter band. Furthermore, three cases were investigated: clean filters, filters loaded with black carbon (BC), and filters loaded with ammonium sulfate. The filter areal loading densities (ρ∗) ranged from 3.1 to 99.6 mg m-2 in the case of the STAP and ammonium sulfate and 1.2 to 37.6 mg m-2 in the case the MA200. Investigating BC-loaded cases, M8 scroll mrow miBCm 15pt was in the range of 2.9 to 43.0 and 1.1 to 16.3 mg m-2 for the STAP and MA200, respectively. Both instruments revealed opposing responses to relative humidity changes ("RH) with different magnitudes. The STAP shows a linear dependence on relative humidity changes. The MA200 is characterized by a distinct exponential recovery after its filter was exposed to relative humidity changes. At a wavelength of 624 nm and for the default 60 s running average output, the STAP reveals an absolute change in σabs per absolute change of RH ("σabsĝ‱"RH) of 0.14 Mm-1 %-1 in the clean case, 0.29 Mm-1 %-1 in the case of BC-loaded filters, and 0.21 Mm-1 %-1 in the case filters loaded with ammonium sulfate. The 60 s running average of the particle light-absorption coefficient at 625 nm measured with the MA200 revealed a response of around -0.4 Mm-1 %-1 for all three cases. Whereas the response of the STAP varies over the different loading materials, in contrast, the MA200 was quite stable. The response was, for the STAP, in the range of 0.17 to 0.24 Mm-1 %-1 and, in the case of ammonium sulfate loading and in the BC-loaded case, 0.17 to 0.62 Mm-1 %-1. In the ammonium sulfate case, the minimum response shown by the MA200 was -0.42 with a maximum of -0.36 Mm-1 %-1 and a minimum of -0.42 and maximum -0.37 Mm-1 %-1 in the case of BC. A linear correction function for the STAP was developed here. It is provided by correlating 1 Hz resolved recalculated particle light-absorption coefficients and RH change rates. The linear response is estimated at 10.08 Mm-1 s-1 %-1. A correction approach for the MA200 is also provided; however, the behavior of the MA200 is more complex. Further research and multi-instrument measurements have to be conducted to fully understand the underlying processes, since the correction approach resulted in different correction parameters across various experiments. However, the exponential recovery after the filter of the MA200 experienced a RH change could be reproduced. However, the given correction approach has to be estimated with other RH sensors as well, since each sensor has a different response time. And, for the given correction approaches, the uncertainties could not be estimated, which was mainly due to the response time of the RH sensor. Therefore, we do not recommend using the given approaches. But they point in the right direction, and despite the imperfections, they are useful for at least estimating the measurement uncertainties due to relative humidity changes. Due to our findings, we recommend using an aerosol dryer upstream of absorption photometers to reduce the RH effect significantly. Furthermore, when absorption photometers are used in vertical measurements, the ascending or descending speed through layers of large relative humidity gradients has to be low to minimize the observed RH effect. But this is simply not possible in some scenarios, especially in unmixed layers or clouds. Additionally, recording the RH of the sample stream allows correcting for the bias during post-processing of the data. This data correction leads to reasonable results, according to the given example in this study. © Author(s) 2019

    Pedestrian exposure to black carbon and PM2.5 emissions in urban hot spots: new findings using mobile measurement techniques and flexible Bayesian regression models

    Get PDF
    Background Data from extensive mobile measurements (MM) of air pollutants provide spatially resolved information on pedestrians’ exposure to particulate matter (black carbon (BC) and PM2.5 mass concentrations). Objective We present a distributional regression model in a Bayesian framework that estimates the effects of spatiotemporal factors on the pollutant concentrations influencing pedestrian exposure. Methods We modeled the mean and variance of the pollutant concentrations obtained from MM in two cities and extended commonly used lognormal models with a lognormal-normal convolution (logNNC) extension for BC to account for instrument measurement error. Results The logNNC extension significantly improved the BC model. From these model results, we found local sources and, hence, local mitigation efforts to improve air quality, have more impact on the ambient levels of BC mass concentrations than on the regulated PM2.5. Significance Firstly, this model (logNNC in bamlss package available in R) could be used for the statistical analysis of MM data from various study areas and pollutants with the potential for predicting pollutant concentrations in urban areas. Secondly, with respect to pedestrian exposure, it is crucial for BC mass concentration to be monitored and regulated in areas dominated by traffic-related air pollution

    Flexible regression for functional object data: curves, shapes and densities

    Get PDF

    Boosting functional response models for location, scale and shape with an application to bacterial competition

    No full text
    We extend generalized additive models for location, scale and shape (GAMLSS) to regression with functional response. This allows us to simultaneously model point-wise mean curves, variances and other distributional parameters of the response in dependence of various scalar and functional covariate effects. In addition, the scope of distributions is extended beyond exponential families. The model is fitted via gradient boosting, which offers inherent model selection and is shown to be suitable for both complex model structures and highly auto-correlated response curves. This enables us to analyse bacterial growth in Escherichia coli in a complex interaction scenario, fruitfully extending usual growth models.Peer Reviewe
    corecore